DOCOHSHT ^fiESOBE 



Bb 151 2t3 

. - TITtf! ' . ' • 



; - ; - . 

so 010 7111 



Eash, Maurice «][. ; Rasher, Sue Pinzur 
-jttproTiJDijg-j»acai Currdcuiu>« AdopJ 



:ision Making- 



, PUB OAJE 
NOTE » 



2iyRS PRICE- 
DESCRI-PTORS 



1 



through Ose of Criterion^Eeferfenced Evalnatrionr A 
Case Study in Social Studies, 
par 78 „ >" \ ' - ' ^ ' - " 

29p.^ Paper ^presented at. Annual Me^ti^ng of. the 
American. Educational "Research Association (Toronto, 
Ontario, March 27-31, 1978)'. • ' 

!!F-'$0^.83 HC-$2.06 Plus Postage. . 
Academic Achievement; *Case Studies HBducatidh) 
Course Objectires; Q^ite^ion Referenced Test' 
Ctirriculam Development; ♦Curriculum Evaluati 
Curriculum Planning; C^urriculum Jlesearchf^ *Dec±sion 
Making; Educational Research; Elementary Education; 
Biemejitary. School Curriculum; *Eva^uation Methods; 
Field Studies; droup Tests; . Inservice Courses;^ 
Instructional Materials; Item A^ialysis; ^^^stery 
Tests; Norm Referenced TestsV Selection; *Socicll 
Studies; *Textbo6k^election - ^ 



ABSTRACT 



i - This case study recounts the attempts of two school 

. districts to improve decision making 'on, the adoption <Jf a new v 

elementary social studies curriculum by using formal evaluation 
, methodology . The . study's main objective' was to develop a series of 
criterion referenced "(mastery l^vel) tests from a l^ist ©f social 
"^-^tudies'^bjectiyes to determine, whether ti^ese objectives were taught 
more effectively by the new curriculum. Otkjer objectives included:-*. 
. developing a "firm data base foif decision ma)cing ; .involving teachers 
and principals in a.fi%ld^test to assess curriculum objectives; and, 
* pro jecting a »plan * f OF an inservice program. Drawing on the^t4chn£c^l 
' resources of the Oniyersity 6f^ Illinois, the districts conducted a 
^ field evaluation of- the new curriculum. Two groups of* students were 
.^'administered the tests; the experiflkentar group, received ^instruction, 
in the ney . cu^j|iculum^ while the control group was exposed' to the 
more traditipfial curriculum.? A total of 1086^stii^ents in 48 ; 
, classrooms' partipipated^ Findings^ indicated the experimentjil grpup 
consistently made h^^gher gains in, achievement ths^n the control group, 
*It w'as^ also concluded that although criterion referenced evaluation 
:.*'prpvid*ed important information- tr^-ditional. test analysis measures' 
'J preyed to be more useful for making decisions* on adoption and fpr 
..pi'4nniag teacher inservice programs / (A^it|ior/iJK) , . ^ 
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. ■ Introduction ^ o * 

• ' ' ^. * , ^ ' • 

This case study recourvtS the attembt of two school districts to improve 

' • , ^ • ^ . 

decision maklrtg, on the, adoption of a nev^. curriculum (grades 1,-6) using formal 

evaluation m§<^odology. Drawing on the technical resources of a university, 

they conducted* a f ield^evaluat ion .of a new social studies curriculum based on 

• '\ ' * V , 

a set of criterion-referenced tests which yere cooperately developed by a , 

* • * - ' ■ 

pommittee from the school districts and the university. The/cbmmi ttee from 

the school districts was interested in examining whettjer the curriculum teaches 

the complex objectives listed by the cBrriculum developers (which were based 

• , ' ■ ■ ; 

i.n the discipline of economics). As one aspvect of the evaluation, the^cortHpi t tee- 
set as a. priority the development of a sefbf criterion-referenced tests which/ 
vroulfl be used Ln/a pre-post test design to measure achievement gains. Two 
groups of students^were admin i stered the tests; 'the experlmentaj^group received 
ins,tructiqn in the curriculun,whi le the c5ntrol * group v^s^exposeld to a more 
traditional curriculum. This case study ,i,s ndt an ^attempt to re^otve the 
issues in the use of criterion-referenced test (CRT) .verlsus norn-referenced 
testiog (l/RT); however, 4t i 1 1 ustpates how a number of th6s^ is*$ue's insistently 
c<^me to thlfe fore asithe decision-making pToces^s on the curriculum advanced 
tijw^rd .cFosure. More spec if leal Uy, ^^rfy in the-/tudy, thfe committee adopted 
a starrce based on an, assumpti on thafe» cr i ter ion-referented ,,Cest ing had marked 



I. Norm referenced tests^ have .trad i t ional ly been recommended for conjparing the 

^ "^effectiveness of two Or more curricula, however, "the districts-wanted to 

icomp^rlfe currfcula and measure mastery*-of concepts of/ the new curriculum. 

• Laclc of-^'tim^.jnd resources precluded the d^velopmept' and administration of 

, ♦ , ^• 

both .cri.ter^ip^^referehced and norm referenc'ed tests. Thus, the districts had 

""^^ ' , * _ . 

to chooae '^h^ich^kind of- te^t s^hould be developed and administered. ^ 



advafitages over normrreferehcecl measures'. In particular, the cqrmyttee felt 
|he use of crl ter lon-re^d^ced testing would avpijd teacher objections to 
compar Isohs ,of jstudent a&t^evement anx)ng classrooms. Thus/ the committee 
wciiil.d raeet^the 'arpvwng demand for accountability in selection and 4Jse 
materials by embarking on a revised approach to adoptinci a new curri^lum: 
an :empi"r»ca}- test.ing o^f the cu/riculum before investing in a. system-wide 

adoption. - , \ *" " ; • 

The objectives of th^ evaluative study were*: . » ^ 

r. To develop a firmer d Jta base -for decision-making in adopt^g new.curric- 
• ulum in the two schoojl district^.^^ - _ ^ ' 

2^- /To develop a 'series of criterion-referenced tests over(a list, of socia], 
' ' studies object ivfes to determine whether these object i'ves -we re taught mor^e 
effectively, by the new curriculum. • - 

X ' . ^ <^ . ' : ^ . u ' ' 

3. ^To* involve^ a fa^ouR of teachers and 'principals in a field test of the ffew \ 

*• 

i' ' ■- • 
- curriculum as an inservice effort to assess •the objectives and direction 

^ ' . O . 

. Qf thfe social studies cur^riculum. 

To project a plan for a contihuihg inservice program on the social studies 

currJcill^m from the findi'ngs of. the evajuative study. ^ \ 

V Instructional materials represent , a major'^t ime- commitment for stud^^its/ 

since up,to 80 percent, of a student's classroom hours are. spent engagjng, ^ 

mateVials (ERIE, 'l977a).- Despite this known time, commi tment and the increasing 

\waLrenes\of the importance of Ihe time variable' in'learning (Carro-ll, r963),». 

'the selection and adoption ofvinstnuct ional fliater ials has been haphazard m 

' - • '* .\ , . ' • , ' ' ^ 

many school d i ^tr icts|, .wi th teasers spending as little as one' hour per^sc>ioo1 

year in the process/^ Nevertheless as the ipajor foci of the, curr ic,ul urn which . - 

^frequently exclusively dictate the ddminate i nstruct lonaj/^i^er^ns truct ionalV 

ffiaterials are major •targets'' of teacher, and community' discontent (EfiJ, vl97.7b) 



\ 



i 



In the choice of instructional materials school districts are often 
reliant* upon subjective imprress^on^of faculty, book publ ishers* pitches, 



and other'schgol district of state authorities* recommendations. Among the/ 
curricula in th^ elementary school, sociaT studids materials, are part iculifrly 

prone t4 criticism 3ue to the lack of public agreement on core objectives and 

■ x 

thfe controversial nature of social stud ies .content . * Social studies inA-he 



elemeiitary^ school ha5 drawn its content from tha d iscipl ihes of history and 
gepgrapljj^ Wre than other social sciences. Thus when a new curr i cu 1/ujii was ^ 
found which drew its contend' heavi ly from economi cs--though it addrresses s'oclil 



' 1 



systems problems that haVe long'bee'n standardl.fare in. some form in the'elemen- 
tary curriculum (family roles, community interdependence*, governmental struc- 
tures and roles) — the' clirr iculum committee felt a need to acquire firmer data 
for decision-making i|;k-T)ecommending adoption^ or rejection of the social studies 

' r ' ' - ' V ^ / • • ^ 

curriculum^ Our Wprking World (Science Research A^ociates, J.97/3-7i*). 



Methodology 



te the instruc- 



« The*methoclology involved designing a field test to evalui 
tional materials against a set of social stud i es, obje(f t iyes which were, accepted 

. .by the curriculum committee. »Thi,s involved a number steps, but the one of 
special cpncern was the developnent of a series of cri^^erion referenced, tests 

„^CRT) which were to be-iis^d^in the evaluation and for f^tu^e testing in the^-' 
..sociail studies curriculum, providing it was adopted*. ' ^ ^\ 

- ^ , The curriculum did not come wi th -prepackaged general objectives agd the 
(^ur'riqulum committee requested the publisher (SRA) to prepare, these for' each 
grade level. - Twelve objectives were prepared ^or each^jgrade 1 ^through 6.^* ♦ . 
After examing, thes^ objectives and pronouncif^g them valid Vcjr thei.r \ocial ^ 

• studjes program, the curricultim committee from the two districts requested 



the Of ,^ce of Evaluat ion Research at the University of llMjnois at Chiba^o ^ 



Ctrcl^ to work with them ill preparing items which v^ould ^est student mastery-, 
of the e^r 



oncept in each of tfhe twelve objectives. 




iV^" tests were developed*: for grades one two ttrere were 

Items per test, two per concept; for grades 3-6 there were\48 Items per* 
t;est, fouf per concept. These item limits were set in the interest x>f mini- 
"'IrHJS testing time and building an instrument wh^ch coul'd b'e administered" 
at -oneNsi ttlng. A s^ampje of three of the objectives and the i telns ^hich* were 
written tds^est them is given in Appendix A. ..It was soon discovered that* 
objectives oKthis complexii?V pose unique difficulty in item writing for they 
^o'not lend themselves to neat learning hierarchies where mastery or competerf£y 
can be defined as a behavior wh ich- i.s needed to move «to the next higher level 

• : , . ^ y ^ 

of instruction.- In planning t,.he-Jt^sts; three leVels of refinement of items were 
•usedj First, .a group of teacher^ at each leyeTwere aslced to inspect the items 
anrf jCidge the] r suitability on two criteria: 1) readability ^nd 2) accuracy in 

testing the concepts 4n the objectives. Seconal, three children at each grade. 

f . ^ ' ' - ' ' • \ 

level were given an individual administration of the test to^check on read- 

ability.. Third/a team of experts in social Studies and, elementary educatit»n 

screened fhe items for feadabillty and accuracy in testing-the concepts in the 

objectives. . . ^ 



The test for ^rade one was com^sed of plctoral^^^tems,' the grade two test 

bad mostly pictoraV items, grade three had a few picioral items and* the other* 

>^ur'grades used the C9nventional, multiple choice and sequencing items. 

.Teachers were given written -Inst cuct ions on test' administration. In the 

1oy/er gMdes t+ie.test was read al<)ud by the teacher and was untlmed at all 

' \ ' «• . • . - * 

grades. ^|^chers were cautioned nqt to interpret or^xplain any items .for 

V . . . . . I / 

ohild'reli. 



'/iastery lev els for each of /th e grades was set by taking the mean of a 
'serie s of judgme nts of experts, a- system uSed by ETS in the Mich.igan State 
"Assessment Tests. » The mastery levels for each grade as set were: . *. , ^ • 

^ . ' ' ^ ' ) . • 

Mast^ery ' ^ ■ . . ■ .J 

.Level: 80^ 75% 70% . 70% . 70% 70% 

■ ^ Eight classes were Chosen at each grade level and /were randomly assigned 

as control or experimental c+asses. The laTter used the new' curri-culum ^ 

Our Working Worfd for social studies after tHe 'pre-test. was adnfini stored. 

The dORtroJ group used several more tbnvent ional ly des igned soc i§l studies 

programs that had been, in use in the t>/o Tii stri-o^s. , JK pre-test Wa§ administered 

to all studeRts in September befo/e iifipl emendation of the social studies 

curricula and a post-test was administered in May after the experimental classes 

had been exposed to all the objective?. A total of 1,0^6 sjLudents ttok bqth 
• ' • • • . • ^ 

the pre and post tests. *• ' * ^ . 

/ . ' * ' • .< - ' * 

Two levels of analyses were run at eaqh grade. -♦The percent of student^ v 

answering ^acH itefra correctly and the percent of stgdeg|ts marten i ng. eacf] don- 

cept;.i.e., answering all the items- ^2 at grade level 1 arid 2, and'B.of ^;?t 



grade level 3--6) correctly was computed. For reasons to* be discussed later, * 
spltt-half rel iabi Ht ies , an item- discrimination index, and an item difficulty 
in<iex were also computed. ' . \ < ' * 

-X ' / Results , • ^ ^. ' 

The results of the pre and pOst tests were first exa'mined^or the'experi- 
jrienta-l and control group by item and by concept; ' * * * • . ^ 

Table A^elow summarizes the mean percerftage of experimental (E) and , " ^ 
control (C) students ^answering .the items correctly.^ The scores are calcirlated 
by taking the sum of the percentage 'Of students answering each item correctly 



(^l) and dividing tfia^t by, the/ number of items (N) : where ==-^ correct 

*item 1 +*-^ cori^e'^t 4tem.2 + . . , ^.correct item N. ■ f - " _ 



Table A , 

, Sumffr^ry of Meari Percentage Uem Wastery of Pre and Rest Tests 



: Exper.imental ^ 'V - Control . / 

leveP Pretest -Posttest Gain\ Pcetest Posttest Gain ^ " J 

1. ' .69:«9 • .80.0% . 10.75 73.20- 79.62^ 6.^2^ 

2. / ^•56.29 ' 67*5^ • 1K25 ^ 64.29 J5ji3 ' J1.54 - 
'3. 65. av; 77.10 11.29 63.85 71.79 7.9^ ' . 

• 39.77 - . kBM ■ '. 8.7i; i»i».02., i»9-:6p 5.'58 % . 

5; A7.7V 53^.60 5. 81 • .^Ji5.67 " ^^^^6-0 3.93 \. 

{>. 55.60 ■ 63.31 . 7.71 5'«.69 . 59.13 '^.^^ • 

An examination' or Table A shows:. that experimental students in Levels 

l and 3 reac^ied speci?ied mastery criteria on the posttest. For '^t he control 

group, students in Levels 2 and 3 reached .specif ied mastery cutoffs. More, 

interesting is the amount of gain from the pretest to the posttest, also 

J 

shown in.Table'A. The experimental group made higher gains for Levels 1, 3, 
4,:5,- and 6 and about the sanie at level '2 although the experimental group,* 
started from a muchObwer .lev.el. These f rndJ ngs^^r'e graphed in Figure ]y % 



which presents^the amount of^gain and Figure 2^ which, displays in graphic 
form pre and pQSt mastery for both groups.* * - 

Tab.le B reports the nfrmber of items for each level for which the expert-* 
mental group students attained mastery levels on the posttest.. For example, 
for Level* 1', 17 itenfsTwere mastered by 81 to 100^ of the group, k items by ^ 
5r to 60%, and 3 Items by 21 to 40?. (The reader is reminded 'tha.t there, are'" 
only2A items ftsr Levels 1 and 2, and A8 items' for Levels 3—6). 



8 
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* - Fi gure 2 , 
/Social Studies Criterion Referenced Tests 
Summary of 'Mean .Percentage Item Mastery Pre and Post Test Scores 

.- -• • .jf ^ • . . • 



^percent 
VMgistery 
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^ . ^ Table B 

• ' z : • 

Breakdown df Per^centage^ of Exper Tmental Group Students Mastering Items 

♦ *' / ' 

* Percentage of Students Mas^teringM tern • V . 

- . . ^ / ^ 

level „D-20r " ai-^jOS; k\-SO% 51-60% , 61-70% 71-80% 81-100% * 
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Mastery of concepts can be examined. simi larly. Tatle presents infor- 

mation regarding the percentage of students mastering concepts. i^^ 
- V * . • - * • . ■ • • • 

Table C- ' ' 

Breakdown of Percentage of Experimental Group Students Mastering* Concepts 



Leve 1 ^ 
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Table C. shows that for Levels 1, 2, 3 and 6 student^ , approached mastery 
on at least half of the concepts. / 

# ^Fbr Level 5# mastery was Incomplete for both experimental and control \ 
group students. For experimental students it Is' uncertain whether this^is a 
^product of pro^blem^ with the test or if the objecJtives are not being learned.^ 
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.by stiidfents due to the' absence o'f- cont^t or irvadequate presentation. 
• - • , , . ^ • " r ^ ^ 

^Therefore, no definite statement can be made by^ ^he^evattfators ^al though as 

we discuss later-, '^e have reason to.beli'feve that the new curriculum content ♦ 

\ , , • ^ * • ^ . ~ 

wqs/4jff icul t for teacher^ as well as c*hijdr^;n in the first, teachlng'cyclev ' 

^ '^ "^ • f ' •*• 
e*fact that experimental students -did make 'somewhat higher gains thart did 

cont^l students indicates, th^t giving attention t^) the objectives and 

teach irvl^ could produce ,mpre 'pes it ive /esul ts for exper/imental' students nexfe/ S 

•fTfte results for level ^J'.iodifcate th^t/the.majorf-i ty of concepts are. not 
be-lftg npstet;ed bySexper imerf^al students. ' In the level ^. analysis , .it wa^ , " 
foynd that the\^est ,rt^ms have, high curriculum validity an<^ goad discriminating , 
p^oweP. . Therefore ,\i t walSrecdmmended thaUthe iristruct ional practices Mo 
"teacrting LeveT'^ conce]^ be^^ami ned\^Sl: u dents may need greater^^o^^'re ^ ^ 
\> and n\ay require mdre practslqe in\u*«ing thk concepts. . • * - '^^^ 

Finally the rrfl iabi^ i ty.\rf the^st tesV^was corpputed. ' Listed'beloW.are 
the rel iab^i:l i ties cal cul ated* for\5bth e>^er iment^,-! *and control groups of thfe ^ ^• 



^past test for a1l,6 lev&ls, 



TablK'D ' 



^ ^ Sumnja'ry Reliabi.li^y TotaJ/^sts\ 



Experimental^ .71 
Cdnt-rol, .07 




• .'>./ 
:-8f»./' .77 



The; re^abil ity of Lfvels 3, 5 an4*6r^are in the sat i s£a(?U)ry range.-< 

"^le results'for Level 1 a^re mixed; whi lejthe reliability for th/ eXer fmenta], 

* ** ' . ^ » . - • 

group js'^erta'inly acceptable , espec f.a|ly considering , the' fa^t that^ the tes^t 
.i.s complete pictoral, the re Uab^M^Uy. for the ^pntr^l gr^Dup, *fol lowing general' 
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vt€st theory, is quite I'owl^ It Is uncertai n whv*. the control reTlablllty Is 
SO tow. A great deal df guessing bycbntrol gPoup students on the post fest 
as the result of 1 ittl^ or^no exposure to the concepts might IJe one explanation. 
1. Si*nce We are uncertain on' the source of the low control reliabil ity, we « 
Tecommended that the reliability be checked again next year to see if problems 
with the^test are indicated or If these re^salts weVe idiosyncratic to' this ' * 
' ; group. The rellabi Ir t/ af' Lev^l Z^is^hb^ as high as would be prfferred. A 
'number of suggestions Were given that might help increase reliability in the 

future, principally along ,the" line af whether, the. content was being taught 

? ' *• * * ' 

to children. The bulk of the Veport rendered to the school systems dealt 

H with' in-depth discussions of tWB^^ results By level and their use in improving \ 

te^ts, curriculum 'and the instJ-uction.^^^ ^^' * * 

As ,0f iginal requested the school d istr icts , . the curriculum committee 

^ and classroom teaqhers received only the a*nalysis discussed abo^e, ^h'cluding 

the criterion referenced data-on percentage of student^ mastering Items and 

concepits by classroom. Following the delivery ofl^the analysis report to...the,' 

committee ,*?OER - for its own rn format ion - examined*the daj^a further, usinj^ 

more sophist icated, analyses including tti^-eompjjtat ion of Indexes of diffTculty " . 

/ and indexes of . discrimmat ion for/^ll Items, for. both ^t he e)^peri mental and.^; ' 

' control populat rons« Aftisr a brief period of S'tudy the comrhlttee returned 

to OER and requested more **stat is t,l.fal " jnforoiatVon which they bel reved would 

/W^nore **helpf uP'^^n making aT decision on -the curriOilum and for revision of 

the tests/ The data which had been prepared by OER for i^n-house only was 

^ . ^ ' • ' " _ 

giyen tfte districts and th^- final report was written incorj)orat Ing both levels^ ^ 

of an^|Js. 'Why did the curriculum committee and teachers -change their 

minds when/ they had been such ardent advocates of criterion-referenced^ 



measurement at the beginning of the pr6ject?. We-^elieve their chanige of mind 



when confronted *Wfth 'a need fq^^'a decl sion was an d?cpress lon^fssome of 

the, unresolved problems l(\ t^7 and indicates a narrower range of usefulness 

of C^T%in currieuTui^: d^isjon-making than some of its enthusiasts have ' 

" * - ' ' 

pVoclalmed; ./ ,|- ' - f^il' 

* ^-^"i^k ' • ^ Discussion 

/ CRTs have, been used in instructional sequences; they have not been 

used widely to evaluate materi^als for curriculum decision making. In their 

Indi^yi dual comparison approach they were considered by the curriculum comm*Kttee 

to be, a very attractive alternative when an evaluation of Our Working WorH 

considered and NRT approacl\es, we r^^ i:ej.ected. The remainder of the paper 

looks at.Jthe issues that emerged/on CRT when used.in^the context of curriculum 

evaluation, ' 

J. r 

Criterion referenced tests* (CRT) and* curriculum adopt ion 

CRTs flSve been used primarily to shape instruction^l^r the individual 
learner. The commonly accepted definition of a CRT: "A criterion-referenced 
test is one that Is deliberately constructed so.a§ to yield measurements that 
are directly in terpretable in terms -of specif i c* performance standards'* 
(Glaser and NitkoKl^^yi) places an emphasis on individual performance against 
specific objectives. Thus objectives are written to contain specific cf'lteria,^ 
for the judgment of a learner's performance,, " * ' . 

In the evaluation of a cOrricul urn. or program,, however, the headed infor- 
nation is different; a dis>tribut ion'of variance in . performance is desirable c 

; - - . : 

a*rld needed. The individual learner question r can the student perform to, 
criteria 7 is viewed as, essent ial ly a^yes or no decision, As. presented in the 
one widely accepted model of CRTs the approach does qot provide the information 
that Is needed by the decision maker to reGommend adoption of a curriculum. 



'•^If a»gr,oup of students performt'^tb criteria then the curriculum may be too*^ 

'easy; i.e., students" a) read^^^ve knowledge, of the mater4al ar|d therefor'Q do 

not n^eed further instruct i<iif.' If few perform to' criteria what shoufd^e 

' cjoa^i is the curriculum*r6o difficult, or is the^ text at f.9ult, are the • 

*^ // * - ft ^ 

object ives inapprppr iate^^ or have the performance object ives overshot the 
.learning hierarchies necessary for their acconi^l ishment? In^any of thesfe^l 

circumstarices CRT giv4s limited and often not very useful informat ion jn^ 
aiding^'thfe curriculum decision maker. For, example, CRT will not answer the \ 
question: If this curriculum adopted, what provi s ioQ^ing costs are going' 
^to be incurred?^ '^What elements wi 1 1^ have to be added by the teacher or school 
^d'hstrict to make/it reasonably acceptable to^ large /i4jmbers.of students^qf 
^yarying abilities? . / * . ^ \ ^ ^ . 

A breakdown of the measures* of ach-ievement in NRTs does give more data 
on* these qciesf ions. For example, an examinat ion of the discrimination 4 
index on eaoh item suggested that teachers' in-serVice was needed to emphasize 
tTie content to be taught and to supplement the content Qf the curriculum. <iood 
learners as weH,as poor learners were nw sl^ead by distractors, ^giving evidence 
of lack oftlcnowJedge^ on_content« We*have good reason to suspect the Ic^ . , 
reliabilities on cOnceptsMn*1evel ^ and 5 tests were functions of the teaching 
and not of the tests; i.e., students were "not taught or giv^n specific practice 
in the cbntent and consequently gains were measure.5j||&^^ncceases in general* 
education and nt)t.the curr kl|l,um;.. We canpe,to this^, conclusion ^ by ch^ckjng the 
dlsc^'lminat ion of items within "the concepts an13 found that several dicj have \ ' 
^ good discrFminat ion j ndexes a^ opposed to the total concept which. ?had »1ow 
reliability. *ln t|iis case NBT^data on reliaVMity and d i scriminat ion gave ^ 
more information related to the costs of provisioning if the curriculum was * 
adopted than the CRT^ data on percentage of students obtai nrng. mastery of each 
Item or concept. , \ V . : 



J The use of feRJs may be inappropriate as a curf-iculum evahuation tool 
from another viewpoint. An jmfrlicit; assumption in the items is that the 

, content and behaviors sought by- tffe curriculum are accurately identj f ied. 
H^npe the oaly problem^to evaluate instruction i5\td see whether th6se goals 
are being mastered by the, learner. Essentially the evaluation results are" 

oriented to questions of time and methodology but npt to whether the curificufum 

content is of suffccient scope anci to whether it is related to significant r ' 

social cdntent.' - ' ' . 

Under this* as^sumpt ion major curricul/jm J ssues are sidestepped and the 

major weight of the evaluation js placed upon ,the vehicle of instruction,. 

Social interests, social concerns and the rnterrel at ionship of concepts to 

values are more likely to. be overlooked" iti The technology of evaluation 

employed. - ' . * . ^ * * » ' , * 

t * ' ^ . ' 

Setting Mastery 'Level s . / . ' " ^"^^ 

The literature reflects a'' range of views on setting levels o^ mastery 

^ . t * \ 

and there is kittle theoretical agr'eement on what constitutes mastery. There^ 
seems to be a growing feeling yth^^ mastery levebs stated in perq,entage figur'es 
are set arbitrarily and Jbqar limited re.lat ions h ip to reality. At best they 
seem t(^e^;set to compensate* for measurement error and. student variabilfty ^ 
which cannot, squeezed out by « r Jgprous ^performance based model.. In the 
attempt to escape from the dilemna\pf fi^ed performance standards in the >^ 

^ St-ate model, 'CRT based.oji.^ Cont inuum model (where mastery'is a Joint 6n a / N 

^ontinuum below which students will^not be passed rather than a single standard 
of perfoWance)* is being advocated (Meskautes, 1976). The curriculum 

• committee did not fincLt^he setting of standards of pecfopmance^usef ul Inasmuch 
as it fai.led to provide direct information 'on how this curricula compared 

u «/ith other curriculum.- The levels of performance were set with a floating 



refererfce e;8lst{ng^ in. the minds of ^^experts*'. the final decision on 
adoption rested ofv whether, the experimental group gai-ned more than the 
control group. « Without the^cop.tror *grdup the Customary way of pcesenting* ^ 
CRT data (n^imber of Studervts* reaching mastery vs. *aon-mastery) would probably f^r f 
have resulted in a negative. dec i;^\ion* on adoption. Pre and post measures 
far the experiment<al groUp were ,m^de njore meaningful when the control groups 

f . ' . ^ ^- V"^ . ^ ^ • '* . 

gams were introduced for coiDparlson espeoially in those, areas of formal 
social systerhs thdt Our Worki.ng Worjd is>espec ial ly flesjgned to teach. 
Percentages of mastery arbi^trari ly set were discarded In the final decision-. * . • 

making on program adop-t iori *as they failed to bring ahy meaningful data foVwarti^ 

' ' X . ' ' ' / . ^ ^' 

on the "comparative wort^h' of curricuiuifi and'.what is needed -if students are to 

perform acceptably jA the .cgrr Icul um design XDf the nevy program. In short there* 

i*s the large qOestion of whether the setting of ^ .^percentage of mastery represents 

meaningfal learnmg in a brjDad ccorlcept focused' program. Our 'judgment was^.that 

, , * . ' ' ' » ' 

it does not and, if used. in program eval uation- ajone, may\caase a program to ^ 

be rejected oh a basics that excludes other val uJbl e learnings fhat would be . 

n^de available<if more Adequate curriculum provisioning were arranged. The 

Structuring of conclusions through different ways of reporting 'data on CRTs has ' . 

been epipbasized and concefitrat Ing Ort^astery of non-mastery can exaggerate 

or diminish^the Tmpact of* tbe curr^iculum (Barta, et,.alj 1976). The use of . / 

^,th^^ fixed level of mastery in a curriculum evaluation which extends beyond* . 



^s|<:ills is, We believe,,. a, poor ^p^act Ice^as i^ can. leadf to/d* prefnaty,re judgment 
on the reject.lon of a .program, v ^ \ . 



Macro-Qbjectjves -and CRTs 



lectJ 

• Rrogram evaluation /In ^ocial studies is concerned with macro-object iyeis 
which embrace a cluster of behaviors and not a specific behavidN- aslfs fdund 

— • ... ■ ' • ■ • ■' I • 

In ski ir performances. I While- there are specific skills in social sfudles, the 

*■'»*• » "'1' 



macro-objectives which are of special interest include number of •behavior^ k 
which*^haye to be directed and synthesized in fashioning the totality of the 
learning.. As an example, the^ objective from Level 3« "The^student wM 

* explain how 'the/ city* can be thought of .as a system- comprised of several. Inler" 

iependept parts" requires" a synthesi.s of many, learnings and not a specific 
\' ' ** • ' - * * » * 

iSQ^t'Cd Skill. As framed, the objectives produced a number of proMems for " , ^- 
the t^t developers which -current models of CRT theory do hot address. One. 

erespbnsexo this criterion might be to suggest that the macro-objective .be . 

' . ' * ' * / .1^ 

broken down into a hierarchy of learning tasks ^Kich'then are Successively 

* mas*tered (Gagne, , ^^67) • Once the specific tasks are ^rokfen/out, then the' 
individual behaviors , can be taught toand synthesized by the lea^rnef into the 

' behavl,6ral cluster that is involved in the citizenship understanding of a ^ ^ 
- city as ah interdependent system. Acting on this instructional design advice 
poses special problems, fqr CRTs.*,-* ' , ' - 

In br'eaking down t4ie macrQ.-0bjective; one quickly builds a lengthening 
list of behayiors each of which require a series of items if learner performance 
is 'to be di rectiy 'assessed;^ An additional quest ion^theh intrudes. If a 5^udent 
masters the^ series of spec if ic items , will, these add ^up into a behavioral 
pattern that is called into action when conf ronted-:.wi th a problfem.that caJls 
for, as in this case, dn indivfdu^Vs analysis of a city as a systeiti with^ . ^* 

*" • . ■ . ■ ^v^ . 

* interdependent parts? There is ho eviden^ce that behavioral patterns that 
resul t . in sol id citi/^ns (presuming solid.tiitizens are knowledgable about, , 

^organized complexities^ are developed through the learning of discrete behaviors 

that can be tujrried injto test items*easily *amenable to the performance jstandar^ds 

' ' ' ' ' ' ^ v«i 

of a CRT. * As a process which is excessively cugibersome the end result j^s a 
yeVy lengthy series pf tes^s to gauge learner performance and -is" not, fn'^our 
Judgment, scarcely an alternatJye*. Moreover the process trivializes sijbjec^t- 



.matter as It atomizes it*into discrete parts and fafses fundameiitaT quest ions 
on whether this .is the mode .for effect ive Jearni ng to take plade. .At the heart 
of this issue is the whole-part learning controvei'sy . . It is no small heresy to . 
r^l^? quest ions^^b^ut , the theory of 1 earn i ng .beiog pursued ^ especial ly wi th 
rapipant behav iorIsm*now holding sw^y in turr iculijui and instruction, but for 
curf.fculum evaluation which purpdrts to use CRT as its vehicle i^t is an issue 

' \ ^ A ( ^ ^ ' ^ 

whi-ch must be faced. . ; 

From |his experience in social studies the domain,, of r^erence f or"^ test .... 
items cannot exclusively be^specific skill oriented performance, especial ly when 
our interests are on macro-«object i ves which encompass Ibroad behavioral ^patterns. 
Tfvis suggests that classical test theory with its NBTs may.be a more significant 
anchor - one that is more corrprehensible without excessive cost .and measures 
citizenship behavior in a^more readily interpretab le conte)^. Despite the seeming 
paradox, relative standards^ as» express^ i n NRTs of students* perforfnance. are^mot^e 
stable and social ly^mor^iinean ingful th^n CRTs based on the judgment/of individual 

- " " , * ' ' . C . * / - ^ 

' instructors in a limited iteiji (l to -4 per concept) approach. CRT theory .ihas. ihot 
addressed this issue, operating as it does* on some implicit assumptions concerning, 

learhiitg. It wilUhave serious limitations when, employed" in curriculum teval uat ion 

*j " ^ # * , . 

where m^cro-object ives are of primacy. . • • - . ^ 

' ' ' "'^ , < ^ ' ' - - ' r'' ^ ' ' : ' 

i)ne technical advance which may be of use to curriculum ^valuators in/ 

^ ' . \ ' " ■' ■ ' ■ 

trying t<^ bypass the problem of limited testing time, is sugge^sted in tne numB\er 
^of ' i t^ms ^theoret ical ly that mus.t be administered to ob,ta'i^n^ test reliability- 
(Davis and^'Diamond, 197^, Mi l lman, 1-973) . ^''^heip^calculatrons suggest that if 

* o . ^ \ 'st-. 



^rformanc^' based objectives are to be used, in cbrriculum evalju^tion, .the- number 
of items would have. to be increased manyfold fr6ro th^/2,.t6 k per concept that 
were used iir thi'5 study. .If curriculum evaluators use the estimates of Dav.is 1^ 
and OriinonJ' that a te*5t of 20 Items per objective should.be used, then test 



\samp'flng by sjtudent with very large samples of^ students would be required 
for a" Curriculum evaluation, presuming that each performance specifically 
measured would require a test of twenfV Items. Economy of tiriie and resources 

I ^In'cdrriculum evciluat Ion usually Imposes the restriction of a few I'tems in 

CRisttd check itia^teryrby thfe learner. . Unfortunately the few Items. do not 
> >V . ] * ' * 

. g ive]sjjffJcient range of a dimension of behavior which is the strongest single 

reasgtn for erjfploying the change* to CRTT Our teachers were most uncom^^ortable 

^- , : " ' - ^ • ^ ' • 

. with the ,CRT^ results as the enormity of the generalizing the mastery of the 

concepfs rested^on such slim i tern ev idente.. While test sampling has been used 

> in one national .curr icututn' eva.l uation. (Walberg^, 1970) it is probably'pot 

' feasible^ at the- jdistr ict level for reasons^ of sample availability and cost, 

. * ."^ - * , " 

J ' ^ * ' \ ^ . 

* In the social studies -curriculum With broad macro-objectives, the interest 

. ^ ' \ ^ ' . r'. ' \ ^ / 

of teacheVs proved' to be. in the range of the dimension of social behavior (in 

a] broad SQnse) anfMDbta.in i ng the best descriptiori /or understanding ^within , 

'•* . ' " * - * 

a ^domain. Can the dimension of social behavior best be understood is set 

forth in specific elements of subject matter which is tested by CRTs, 
(mastery-nonmastery)-, or is it best described ^by^measdres Which are descripti-ve 
of i t^ d i str ibut ion within a popu l^ation (NRT)? She curriculum commi^ee, aftei:;. 
examining both sets of dat^' descr i bed previously, concluded that measures whi/:h 
are ^de^cr iptive of population (oj sample) distribution are most useful to 

curriculum .adopt ion deicisjon-making; . Jhese measures are the identifying mark 

«' - ^ • 

of (lRfs,^which give a calibrated measure of the distribution of population 

characteristics. Woodson (197^) ^argues that variance tn the^est items Js 

' f * ' - - ^ . • 

• • ,** 

critical to providing useful informat ion, and CRTs , in restrict^ing this range 

' < * < ' ' * ' "i 

' in'^the J tems, inay simplyrbe linnliting i nformat'loii. Wh'l,dh is more, representat i Ve 

' *, • ' ^ 

of the way. We. judge social behavior? Our evidence in this study suggests that 

a. normative judgment is.. . ^ ^ 



The ^absolute ollt t,he- State moders all or none .tapproach «ln the mastery- 



non^-mastery .CRT proved troublesome tq teachers althoijgh as previously e)*plalned', 

s CRT model or Igtinal ly -had appeal In avoltUng o<)fnpar i sons- of individuals 
Sind claStSfOoms^^^JTfie cofwnittee quickly discovered fhat, by exam I ng^ CRT* scores, 
the social behaviors inTRe4e--t^st^ are a^matter of degr;ee judged' within a* ^ 
population and are^not dtchotomous nor^'^venaccurately represented In a 
continuum raodel^ Thus a cpraposrte test score^tfet-^s placed withln^a known 
population was more informative than one which .was estabu^ste^rf^pr the r^n(iJ^^dual 
The committee concluded that j'elative scores of NRTs are a mOre stable Indicator 
of students* social learning than fixed absolute scores based, on exp€trt» erst Ifnates 
especially for purposes^f curriculum adoption;' , . . ^ , • 



Evaluative Considerations 



•■r 



/In the development of the tests the curriculum committee early on 
encountered* a problem in th§ field testing, ^Fleld^testr'Wr.e conducted fi/st^ 
wLtth teachers and secondly wi th 'studen^ts to 'see iT^the t«|fts' re^adabll f ty7 were 
^suitabje for the several leveli. A copy pf the tests an^^ojbjj^^ves : 
distributed to twa teacheVs a^^eacK level for the pfirpose .of obtaVn m^£hel r 
judgments on readability and curriculum validity. As a group [theyv^ef^ very 
cflta:al of the pectoral Items and sjuggjested many graphic as opposed to content'" 

changes. Fortunately -the^ chartges were not^-made prior to a field test*wlth 

• * ^ ' . ' ^ '( 

students; students evidenced no diff l,culty wfth the pictoral 4 terns on ^readabl 1 1 ty 

although they found the cqntenfdiff icul tf^This ^xperiencTe cast's Turther doubts 

OB th^ usefulness OT mastery levels, set a-priori by teachers as being creditable 

standards judgment* of pupil performance. - * ^ * - 

' In this type of. a* curriculum evaluation the new material is at a s^iieat 

' ' / ' \ ' ' ^ ' ' ' ' ' - a ' i 

disadvantage^ as teachers are on the first cycle of t-eachlng. ' A heavy bur^^nr is 
" * ' " * \ ' ' ' * 

'.placed oa the material s^ instructional desigo as* ^eachers learn the Iratertal 



^Wlth the cbildren. (in' the abOve^H^ld t^t with t-eatheKs, teachers fouad* 

mahy of the test items difficglt and admitted they probably would itot score' 

'wel! on the upper leve«1 tests)*^^ the scores obtained either as compos i^te^ or 

on matstery of* items may not^be representative of what'wauld be^obtalne'd in a 

. secQnd cycle of' teaching the curriculum. Therefore i-n.useiof l\ie findings 

gerteratecl for curri<tul\jm evaluation, they% undoubtedly* represent* a conservatively 

performance by students and teachers. HoweVer the result^: whet> analyzed by 

' . ^ ' < ^ ^ \ ' 

item discriminatipn, 'di stractqr counts and level of d i f f icul ty o'f/ered extensive 

information qf.^^curr i cul urn and instruction provisioning needed^ if* students were 

gomg to be successful. Of MXticular interest were the -large^ gaps in ^ 

students \iiaowl edge that became apparent;; ^\g:,'a m^jpri^y of ^students Jnd i cated 

that^the chief executive of Illinois is the mg^or. Extensive i^n-service 

suggestions for instructional time, technique and content were drawn for each 

U ' , ' » 

levenfrom the standard noFtn referenced statKstsics and the d is tna^p^r , counts 



which are given. 

■Sin 



Conclusion ^ v. • 

There is increasing i nterest in'.cr i^eo^fon-r.ef el^eac^d testing as curriculum^ 
emphases shift to individual masteYy of concepts fr^Hhe traditional norm- 
referenced scores of standardized achievement testing.' The value of criterion- 

r ^ ' , ' .: . — 

referenced measurement .and its relationship to classical te^st theory has b^en * 
the subject c/ debate (Bernkopf and^Bashau, 3976).. How usefuL the information 
^b^ained throughr criterion-referencecL measures in decf $ ipn-maklag ofi' select ion 
j. of instruction^ materr%Js is a question tharhas_nO± bieec^ investigated as 
carefully as has the use of CRTs in guiding instruttion. This s^tudy Completed 
m forty-eight classrooms in two school districts found that teachers*', ftre-^^ 
dispositions toward criterion-referenced tests wer^ weakened when they received/ 
,.the Inforrpat ion of children^'s performance and 'were"^'sked to make' a deci slon on 



adoption of i nstr,uct ional 'ma-t^rials based on'^ these results. > They jrequested 
the traditional item analysis data and central tendency raeasUres as well as the 
profiling of classes against, the district's mean scores. I temd i scr im'Jnat ion 
.scores and distractor counts were seen as p^rt ictiiarly helpful by the. 
curriculum commjttee. The school districts on .the basis of the findings did 
adopt the hew social , studies ^curriculum. WhiJe criterLon-referenced evaluation 
provided important information foY aiding the. school board in adopting a - S' 
drastically different social studies program, it was n'ot considered sufficient 
for making the adoption decision nor was it ,cons idered sufficient- by~ the 
curriculum^^lnmi ttee of administrators to diject i^iservice effort§. The field 
jnve^st igat ion found emerging at the decisjonrmaking level the issues of measure- 
ment, that have emerged in tFre^ theoretical 1 i terature-on .CRTs. .Because a 
decision was requ i recl> there had. to be a(. resolut ion of these kssues.. When a 
better data base* for malcing adopt ion.s /and for directing an inservice program 

was sought, both approaches for analjfsi^ of the findings were' employed. More' 

- f L ' ' ' 

importantly the CRT data are not an/ adequate substitute for NRT data "in 

curriculum evaluation where/ a school dis.trjet materials adoption decision 

i s at;^ stake* , ' ' 
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' \. ' Objective 12.. The student should be «ble to ci te exanples of differeht 
/ . . . attitudes and beliefs held by persons. In the\r community. 



• 23. If you eould tell only one person about ifbur visit to a baseball game., 
*. >lio^doyou feel would be most Interested -in listening ? -Circle the 
, "^'Picture of this one person. 





YOUR DOCTOR 




YOUR GYM TEACHER 



' A' STORE' CLERK 



24. There are some buildings al-l people-go to and there are other 
' • : buiiaings that only spme^people-go to. Circfe the building to which 
\ only some pe^ go.' ' , . \ J - 







SUeERMARKEF 



DEPARTMENT ST^RE 




'>CHU RCH 



LE\rEL THREE 
Object i\^e 12. 



The student will be able to make two. li sts,. "one citing 
dire.ct and the other i(idirect costs .©f cri 



' ime . 



45- 



* v' 



» » 

48. When a burglar steals from someone's liptjseAthere are direct 
, and. indlreqt costs of the burglary." A clirect-co^t Is the niorey 
spent because of {he specific burglary.' An indirect cosfis the,'* 
money spent to. avoid future, burglaries. 
•A burglar steals two TV sets, me radio, a record player, and many 
small items. ' Place an "X" on the lifie by four of the sentences 
_ that are examples of an Indirect Cost of this burglary. * 

— -^^.w^r^^ ^^^^-^ S^"^ TO RERLACE THE STOLEN ONE 

._:'THE|JFgMLY BUYS A NEW RADIO TO REPLACE THE STOLEN ONE • 

TlipAiyilLY PUTS AN EXTRA LOCK ON THE DOOR 
__. JVIORE POLICEMEN ARE HIRED TO WATCH THE NEIGHBORHOOD " ^ 
. A NEW JA'IL IS BUILT 4N TOWN.' • . ' 

A WATCH DOG IS BOUGHT BY THE FAMILY. 

THE FAMILY SAVES MONEY BY NOJ LEAVING FOR A VACATION 

THE FAMILY DISCOVERS A WEEK LATER THAT IHE^OASTER IS MISSING 



ERIC ; , 
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LEVEL FIVE, 



Objective 12.' The student wHl bjs ifble tVclfe several examples of 
episodes^which^ challenged the existing social system 
and describe whether ^hose challenges moved^ the system 
^ closer to or farther from the American ideals. ^ , 



-48* Choose the four eventer whpLch have challenged the present • 
.social system and have moved the system closer to American 
ideals. • • y ^ • 

government can record -private conversations of qitizens 
giving everyone a lavyer when he/she, is arrqsted for a crime 



st6pping the printing of news that criticizes politicians 
parents refusing to send children, to. school 



" givj^^tg^ everyone an equal* chance to qualify for a. job 

giving everyone the right td^vote 

having oruLy. one major political pc^rty 



» 

government protecting the right of every citizen. to purchase 
or rent a house in any community , 



V 



